Studying the Impact of Multicore Processor Scaling on Cache Coherence Directories via Reuse Distance Analysis
نویسنده
چکیده
Title of dissertation: Studying the Impact of Multicore Processor Scaling on Cache Coherence Directories via Reuse Distance Analysis Minshu Zhao, Doctor of Philosophy, 2015 Dissertation directed by: Professor Donald Yeung Department of Electrical and Computer Engineering Directories are one key part of a processor’s cache coherence hardware, and constitute one of the main bottlenecks in multicore processor scaling, e.g. core count and cache size scaling. Many research effects have tried to improve the scalability of the directory, but most of them only simulate a few architecture configurations. It is important to study the directory’s architecture dependency, as the CPUs continue to scale. This is because besides applications, directory behaviors are also highly sensitive to architecture. Varying core count directly affect s the amount of sharing in the directory, and varying the data cache hierarchy affects the directory access stream. But unfortunately, exploring the huge design space of multiple core counts and cache configurations is challenging using traditional architectural simulation due to the slow speed of simulations. This thesis studies the directory using multicore reuse distance analysis. It extends existing multicore reuse distance techniques, developing a method to extract directory access information from the parallel LRU stacks used to acquire privatestack reuse distance profiles. This thesis implements this method in a PIN-based profiler to study the directory behavior, including the directory access pattern and directory content, and to analyze current directory techniques. The profile results show that the directory accesses are highly dependent on cache size, exhibiting a 3.5x drop when scaling the data cache size from 16KB to 1MB; the sharing causes the ratio of directory entry to cache blocks to drop below 50%; and the majority of the accesses are to a small percentage of the directory entries. Cache simulations are performed to validate the profiling results, showing the profiled results are within 14.5% of simulation on average. This thesis also analyzes different directory techniques using the insights from the profiler. The case studies on the Cuckoo, DGD, SCD techniques and multi-level directories show that required directory size varies significantly with CPU scaling, the opportunity of compressing private data decreases with cache scaling, reducing the sharer list size is an effective technique and a small L1 directory is sufficient to capture most of the latency critical accesses respectively. Studying the Impact of Multicore Processor Scaling on Cache Coherence Directories via Reuse Distance Analysis
منابع مشابه
A Using Multicore Reuse Distance to Study Coherence Directories
Researchers have proposed numerous techniques to improve the scalability of coherence directories. The effectiveness of these techniques not only depends on application behavior, but also on the CPU’s configuration, e.g. its core count and cache size. As CPUs continue to scale, it is essential to explore the directory’s application and architecture dependences. However, this is challenging give...
متن کاملDistance Analysis for Large - Scale Chip Multiprocessors
Title of dissertation: REUSE DISTANCE ANALYSIS FOR LARGE-SCALE CHIP MULTIPROCESSORS Meng-Ju Wu, Doctor of Philosophy, 2012 Dissertation directed by: Professor Donald Yeung Department of Electrical and Computer Engineering Multicore Reuse Distance (RD) analysis is a powerful tool that can potentially provide a parallel program’s detailed memory behavior. Concurrent Reuse Distance (CRD) and Priva...
متن کاملMemory Performance Analysis for Parallel Programs Using Concurrent Reuse Distance
Performance on multicore processors is determined largely by on-chip cache. Computer architects have conducted numerous studies in the past that vary core count and cache capacity as well as problem size to understand impact on cache behavior. These studies are very costly due to the combinatorial design spaces they must explore. Reuse distance (RD) analysis can help architects explore multicor...
متن کاملStudying Directory Access Patterns via Reuse Distance Analysis and Evaluating Their Impact on Multi-Level Directory Caches
The trend for multicore CPUs is towards increasing core count. One of the key limiters to scaling will be the on-chip directory cache. Our work investigates moving portions of the directory away from the cores, perhaps to off-chip DRAM, where ample capacity exists. While suchmulti-level directory caches exhibit increased latency, several aspects of directory accesses will shield CPU performance...
متن کاملUnderstanding Multicore Cache Behavior of Loop-based Parallel Programs via Reuse Distance Analysis
Understanding multicore memory behavior is crucial, but can be challenging due to the cache hierarchies employed in modern CPUs. In today’s hierarchies, performance is determined by complex thread interactions, such as interference in shared caches and replication and communication in private caches. Researchers normally perform simulation to sort out these interactions, but this can be costly ...
متن کامل